Skip to content

fix(zarr-metadata): model stored metadata more closely#3962

Merged
d-v-b merged 9 commits into
zarr-developers:mainfrom
d-v-b:update-zarr-metadata
May 12, 2026
Merged

fix(zarr-metadata): model stored metadata more closely#3962
d-v-b merged 9 commits into
zarr-developers:mainfrom
d-v-b:update-zarr-metadata

Conversation

@d-v-b
Copy link
Copy Markdown
Contributor

@d-v-b d-v-b commented May 10, 2026

Zarr V2 uses a separate JSON document named .zattrs for the attributes of an array or group.
This package was inconsistent about how it modelled this fact. The array metadata document type modelled
array fields (shape, dtype, etc), which would be stored in .zarray, AND the attributes field,
which would be stored in .zattrs. Thus the array metadata model matched the representation
of an array that a program might use, rather than the stored layout. But the group metadata type didn't
follow this pattern -- it has no attributes field.

This PR addresses that inconsistency by adding an attributes field to GroupMetadataV2. That field is
not required. To model the stored representation of V2 data, this PR adds 3 new types: ZArrayMetadata,
ZGroupMetadata, and ZAttrsMetadata, that closely model the contents of the .zarray, .zgroup, and
.zattrs documents, respectively.

This change makes the V2 consolidated metadata type more accurate, as consolidated metadata for Zarr V2
is comprised of inlined metadata documents.

d-v-b and others added 4 commits May 10, 2026 10:36
Zarr V2 uses a separate JSON document named `.zattrs` for the attributes of an array or group.
This package was inconsistent about how it modelled this fact. The array metadata document type modelled
array fields (`shape`, `dtype`, etc), which would be stored in `.zarray`,  AND the `attributes` field,
which would be stored in `.zattrs`. Thus the array metadata model matched the representation
of an array that a program might use, rather than the stored layout. But the group metadata type didn't
follow this pattern -- it has no `attributes` field.

This PR addresses that inconsistency by adding an `attributes` field to `GroupMetadataV2`. That field is
not required. To model the stored representation of V2 data, this PR adds 3 new types: `ZArrayMetadata`,
`ZGroupMetadata`, and `ZAttrsMetadata`, that closely model the contents of the `.zarray`, `.zgroup`, and
`.zattrs` documents, respectively.

This change makes the V2 consolidated metadata type more accurate, as consolidated metadata for Zarr V2
is comprised of inlined metadata documents.
…Metadata at top level

The on-disk file types added in 8b7af90 were importable from the
v2 submodule but not from the package root. Add them to the top-level
__init__.py so consumers can import them as `zarr_metadata.ZArrayMetadata`
etc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label May 10, 2026
@d-v-b d-v-b requested a review from ilan-gold May 10, 2026 19:03
@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.28%. Comparing base (1020ca5) to head (5e32cdf).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3962   +/-   ##
=======================================
  Coverage   93.28%   93.28%           
=======================================
  Files          87       87           
  Lines       11745    11745           
=======================================
  Hits        10956    10956           
  Misses        789      789           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented May 10, 2026

cc @chuckwondo

Comment thread packages/zarr-metadata/src/zarr_metadata/v2/array.py
Comment thread packages/zarr-metadata/src/zarr_metadata/v2/group.py
@d-v-b d-v-b merged commit eac9c86 into zarr-developers:main May 12, 2026
37 checks passed
@d-v-b d-v-b deleted the update-zarr-metadata branch May 12, 2026 20:22
@maxrjones maxrjones mentioned this pull request May 13, 2026
d-v-b added a commit to d-v-b/zarr-python that referenced this pull request May 14, 2026
uv.lock was removed in zarr-developers#3962 as unused. The mypy-via-hatch change in
this branch makes it load-bearing again: it is the single source of
truth that keeps the `dev` hatch environment (and therefore mypy's
results) consistent across developer machines and CI. Restore it,
regenerated against the current pyproject.toml.
d-v-b added a commit that referenced this pull request May 15, 2026
* chore: ignore docs/superpowers/ scratch directory

* chore: pin python 3.12 on hatch dev env

* chore: run mypy from hatch dev env, drop mirrors-mypy hook

Replace the pre-commit/mirrors-mypy hook (which maintained its own
duplicate dep list) with a `repo: local` hook that runs
`hatch run dev:mypy`. The dev hatch env's `dev` group (resolved via
uv.lock) becomes the single source of truth for mypy's dependency set.

This also unpins numpy from the type-check environment (it was
hard-pinned to `numpy==2.1` in the old hook); type fixes that follow
keep mypy clean against current numpy stubs:

  - relax NDArrayLike.reshape/all signatures so np.ndarray
    structurally satisfies the protocol
  - widen AsyncGroup.require_array's `dtype` to include None
  - add narrowly-scoped `# type: ignore` comments with explanatory
    notes where numpy 2.x stubs are too strict against runtime-valid
    calls (datetime64 unit f-strings, 'generic' unit sentinel,
    newbyteorder subclass identity, ZDTypeLike None handling)
  - drop stale `# type: ignore` comments that are no longer needed

* ci: install hatch in lint workflow so mypy hook can run

* docs: changelog for mypy-in-dev-env change

* refactor: resolve None dtype at create() boundary

`create()` accepts `dtype=None` (legacy v2 behavior: an unspecified
dtype defaults to float64). Previously this `None` was forwarded
untyped into `_create`, which doesn't accept `None` — it only worked
because `parse_dtype(None)` -> `np.dtype(None)` happens to resolve to
float64. That required a `cast()` to silence mypy.

Resolve `None` to `"float64"` explicitly in `create()` before
forwarding, so the value passed to `_create` is a real dtype and the
cast is no longer needed. No behavior change.

* refactor: give NDArrayLike.reshape/all precise signatures

The initial fix for numpy-stub conformance widened the NDArrayLike
protocol's `reshape` and `all` to `(*args: Any, **kwargs: Any) -> Any`,
which erased type information for every consumer of the protocol.

Replace with precise signatures that np.ndarray still satisfies
structurally:

  - `reshape(shape: tuple[int, ...], /, *, order=..., copy=...)
    -> NDArrayLike` — the `Literal[-1]` form was the only thing
    blocking a precise signature (it straddles numpy's arity-split
    overloads); it is unused on protocol-typed values, so drop it.
    `NDBuffer.reshape` keeps its public `-1` support by normalizing
    `-1` to `(-1,)` before forwarding.
  - `all(self) -> np.bool_` — the sole caller wraps the result in
    `bool(...)`, and no-arg is all we use.

* chore: remove gitignore for claude docs

* chore: restore uv.lock

uv.lock was removed in #3962 as unused. The mypy-via-hatch change in
this branch makes it load-bearing again: it is the single source of
truth that keeps the `dev` hatch environment (and therefore mypy's
results) consistent across developer machines and CI. Restore it,
regenerated against the current pyproject.toml.

* docs: rename changelog entry to PR #3972

* ci: skip mypy hook on pre-commit.ci

The mypy hook is now `language: system` and shells out to
`hatch run dev:mypy`, which needs the project's hatch dev environment.
pre-commit.ci's hosted runners don't have it, so the hook can only
fail there. Add it to `ci.skip`; mypy is still covered by the Lint
GitHub Actions workflow (which installs hatch) and by local prek runs.

* refactor: apply review nitpicks from PR #3972

- Inline the float64 dtype default into the `_create` call instead of
  reassigning the `dtype` variable.
- Move the numpy 2.x stub explanation onto its own line above the code
  so `# type: ignore` comment lines stay short.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: run mypy via `uv run` so the lockfile is actually honored

`hatch run dev:mypy` does not consume `uv.lock` — hatch has no lockfile
support and re-resolves the `dev` dependency group from scratch each time
it builds the environment. This defeated the PR's goal of a reproducible
type-checking environment: contributors with stale or differently-resolved
hatch `dev` envs saw different mypy results (e.g. errors from an older
`tomlkit` whose `TOMLDocument.__getitem__` was typed `Item | Container`
rather than `Any`).

Switch the mypy pre-commit hook and the Lint workflow to `uv run --frozen
mypy`. `uv` does sync from `uv.lock`, so the committed lockfile becomes the
real single source of truth for mypy's dependency set, identical for every
contributor and for CI.

- .pre-commit-config.yaml: hook entry `hatch run dev:mypy` -> `uv run --frozen mypy`
- .github/workflows/lint.yml: install `uv` instead of `hatch`
- pyproject.toml / changes: update wording to match

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Apply suggestion from @maxrjones

Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com>

* fix: test dtype is None exactly, not via falsy collapse

`dtype or "float64"` substitutes the default for any falsy input —
empty string, 0, empty Mapping — not just None. Those wouldn't pass
ZDTypeLike validation anyway, but the failure mode was "silent
substitution to float64" instead of "raise on invalid input".
Use an exact `is None` check expressed as a conditional expression.

* chore: add .python-version pinning default to 3.12

uv reads `.python-version` to decide which interpreter to use for
`uv venv` / `uv sync` / `uv run`. With the mypy hook now running as
`uv run --frozen mypy`, pinning the interpreter here keeps the dev
env consistent across developer machines — matching the existing
`[tool.mypy].python_version = "3.12"` and `requires-python = ">=3.12"`
declarations.

`.python-version` is not consumed by hatch (its envs declare their
own Python via `[tool.hatch.envs.*].python`), so the test matrix
(py3.12/3.13/3.14) is unaffected.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs release notes Automatically applied to PRs which haven't added release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants